perm filename CNTOUR[0,BGB]6 blob
sn#116853 filedate 1974-08-27 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00011 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 {⊂C<NαIMAGE CONTOURING.λ30P82I425,0JCFA} SECTION 7.
C00005 00003
C00010 00004 ⊂7.1 CRE - An Image Proc%ssing Sub-System.⊃
C00017 00005
C00021 00006 ⊂7.2 Thresholding.⊃
C00025 00007 ⊂7.3 Contouring.⊃
C00030 00008 ⊂7.4 Polygon Nesting.⊃
C00034 00009
C00037 00010 ⊂7.5 Contour Segmentation.⊃
C00041 00011 ⊂7.6 Related and Future Image Analysis.⊃
C00044 ENDMK
C⊗;
{⊂C;<N;αIMAGE CONTOURING.;λ30;P82;I425,0;JCFA} SECTION 7.
{JCFD} VIDEO IMAGE CONTOURING.
{λ10;W250;JAFA}
7.0 Introduction to Image Analysis.
7.1 CRE - An Image Processing System.
7.2 Thresholding.
7.3 Contouring.
7.4 Polygon Nesting.
7.5 Contour Segmentation.
7.6 Related and Future Image Analysis.
{λ30;W0;I900,0;JU;FA}
⊂7.0 Introduction to Image Analysis.⊃
Simply put, image analysis is the inverse of image synthesis.
From this point of view, the usually difficult question of "analysis
into what ?" is answered by the answer to the question "synthesis
from what ?". Since a 3-D geometric model is adequate (and necessary)
for synthesizing digital television pictures, it is reasonable to
suppose that such a model is an appropriate subgoal in the analysis
of television pictures. Such an analysis into a 3-D model would
provide a useful data reduction as well as a convenient
representation for solving robotics problems such as manipulation,
navigation and recognition. This approach to image analysis is
somewhat heretical, the orthodox method is to extract features from
2-D images, which features are then used directly for the desired
task. On the other hand, vision by inverse computer graphics may be
viewed as an extreme form of feature finding, involving the
extraction of a set of basic geometric features which are combined to
form a superfeature, a 3-D model. The rest of this introduction
enumerates some of the kinds of information available in a sequence
of images and some of the kinds of data structures for representing
images. An image is a 2-D data structure representing the contents of
a rectangle from the pattern of light formed by a thin lens; a
sequence of images in time is called a film.
Three basic kinds of information in an image are photometric
information, geometric information, and topological information.
Fundamentally, geometry concerns distance measure. The geometry of
an image is based on coordinate pairs that are associated with the
elements that form the image. From the coordinates such geometric
properties as length, area, angle and moments can be computed.
Photometry means light measure, although physical measurements of
light may include power, hue, saturation, polarization and phase;
only the relative power between points of the same image is easily
available to a computer using a television camera. The taking of
color images is possible at Stanford by means of filters; however,
the acquisition of color is inconvenient and has not been seriously
pursued in the present work. Finally, topology has to do with
neighborhoods, what is next to what; topological data may be
explicitly represented by pointers between related entities, or
implicitly represented by the format of the data.
Three basic kinds of image data structures are the raster,
the contour map and the mosaic. A raster image is a two dimensional
integer valued array of pixels; a pixel "picture element", is a
single sample position on a vidicon. Although the real shape of a
pixel is probably that of a blunt ellipse; the fiction that pixels
tesselate the image into little rectangles will be adopted. For other
theoretical purposes the array is assumed to be formed by sampling
and truncating a two dimensional, smooth, infinitely differentiable
real valued function. A contour image is like a geodesic contour map,
no two contours ever cross and all the contours close. A mosaic
image (or tesselation) is like a ceramic tile mosaic, no two regions
ever overlap and the whole image is completely covered with tiles.
Further useful restrictions might be made concerning whether it is
permitted to have tiles with holes surrounding smaller tiles or
whether it is permitted for a tile to have points that are thinner
than a single pixel.
Given a raster image, the usual visual analysis approach is
to find the features. One canonical geometric image feature is
called the <edge> and the places where edges are not found are called
<regions>. For a naive start, an edge can be defined as a locus of
change in the image function. Edges and regions are complementary
sides of the same slippery concept; the concept is slippery because
although a continuous function of two variables and a graph of edges
are each well known mathematical objects the conversion of one into
the other is a poorly understood process that depends greatly on ones
motives and resources. A computational definition of the region/edge
notion would include a procedure for converting a raster
approximation of an image function into a region/edge representation
based on parameters which are conceptually elegant.
⊂7.1 CRE - An Image Processing Sub-System.⊃
The acronym CRE stands for "Contour, Region, Edge". CRE is
a solution to the problem of finding contour edges in a sequence of
television pictures and of linking corresponding edges and polygons
from one picture to the next. The process is automatic and is
intended to run without human intervention. Furthermore, the process
is bottom up; there are no inputs that anticipate the content of the
given television images. The output of CRE is a 2-D contour map data
structure which is suitable input to the 3-D modeling program,
GEOMED. Five design choices that determine the character of CRE are
listed in Box 7.1. The design choices are ordered from the more
strategic to the more tactical; the first three choices being
research strategies, the latter two choices being programming
tactics. Adopting these design choices lead to image contouring and
contour map structures similar to those of Krakauer (71) and Zahn (66).
{|;JA;λ10;}
BOX 7.1 {JC} CRE DESIGN CHOICES
1. Dumb vision rather than model driven vision.
2. Multi image analysis rather than single image analysis.
3. Total image structure imposed on edge finding; rather
than separate edge finder and image analyzer.
4. Automatic rather than interactive.
5. Machine language rather than higher level language.
{|;λ30;JUFA}
The first design choice does not refer to the issue of how
model dependent a finished general vision system will be (it will be
quite model dependent), but rather to the issue of how one should
begin building such a system. The best starting
points are at the two apparent extremes of nearly total knowledge of
a particular visual world or nearly total ignorance. The first
extreme involves synthesis (by computer graphics) of a predicted 2-D
image, followed by comparing the predicted and a perceived image for
slight differences which are expected but not yet measured. The
second extreme involves analyzing perceived images into structures
which can be readily compared for near equality and measured for
slight differences; followed by the construction of a 3-D geometric
model of the perceived world. The point is that in both cases images
are compared, and in both cases the 3-D model initially (or finally)
contains specific numerical data on the geometry and physics of the
particular world being looked at.
The second design choice, of multi image analysis rather than
single image analysis, provides a basis for solving for camera
positions and feature depths. The third design choice solves (or
rather avoids) the problem of integrating an edge finder's results
into an image. By using a very simple edge finder, and by accepting
all the edges found, the image structure is never lost. This design
postpones the problem of interpreting photometric edges as physical
edges. The fourth choice is a resolution to write an image processor
that does not require operator assistance or manual parameter tuning.
The final design choice of using machine language was for the sake of
implementing node link data structures that are processed one hundred
times faster than LEAP, ten times faster than compiled LISP and that
require significantly less memory than similar structures in either
LISP or LEAP. Furthermore machine code assembles and loads faster
than higher level languages; and machine code can be extensively
fixed and altered without recompiling.
It is my impression that CRE itself does not raise any really
new scientific problems; nor does it have any really new solutions to
the old problems; rather CRE is another competent video region edge
finding program with its own set of tricks. However, it is further
my impression that the particular tricks for nesting and comparing
polygons in CRE are original programming techniques. As a part of the
larger feedback system, CRE is a necessary, but not entirely
satisfactory implementation of pure bottom up image analysis.
CRE consists of five steps: thresholding, contouring,
nesting, smoothing and comparing. Thresholding, contouring and
smoothing perform conversions between two different kinds of images.
Nesting and contouring compute topological relationships within a
given image representation. In summary the major operations and
operands are as listed in Box 7.2; the VIC-Images are Video Intesity
Contour Images and the ARC-images are contours that have been smoothed.
{|;T300,600,850;JAFA;λ10;}
BOX 7.2 {JC} CRE DATA TRANSFORMATIONS.
~MAJOR OPERATION~ ~OPERAND~ ~RESULT~.
1. THRESHOLDING: 6-BIT-IMAGE, 1-BIT-IMAGES.
2. CONTOURING: 1-BIT-IMAGES, VIC-IMAGE.
3. NESTING: VIC-IMAGE, NESTED-VIC-IMAGE.
4. SMOOTHING: VIC-IMAGE, ARC-IMAGE.
5. COMPARING: IMAGE & FILM, FILM.
{|;T-1;JU;λ30;}
The initial operand is a 6-bit video raster, which in the
present implementation is coerced into a window of 216 row by 288
columns; intermediate operands consist of 1-bit rasters named PAC,
VSEG and HSEG which are explained below, as well as a raster of links
named SKY which is used to perform the polygon nesting. The magic
window size 216 by 288 was derive by considering the largest product
of powers of two and three that would fit within a video image. The
final result is a node/link structure composed of several kinds of
nodes: vectors, arcs, polygons, lamtens (lamina inertia tensors)
levels, images and the film node.
Although the natural order of operations is sequential from
image thresholding to image comparing; in order to keep memory size
down, the first four steps are applied one intensity level at a time
from the darkest cut to the lightest cut (only nesting depends on
this sequential cut order); and comparing is applied to whole images.
Figure 7.1 illustrates an initial video image and its corresponding
contour image. The contoured image consists of thirteen intensity
levels and took 45 seconds to compute (on a PDP-10, two microsecond memory).
The final CRE data structure was composed of 1996 nodes.
⊂7.2 Thresholding.⊃
Thresholding, the first and easiest step of CRE, consists of
two subroutines, called THRESH and PACXOR. THRESH converts a 6-bit
image into a 1-bit image with respect to a given threshold cut level
between zero for black and sixty-three for light. All pixels equal to
or greater than the cut, map into a one; all the pixels less than the
cut, map into zero. The resulting 1-bit image is stored in a bit
array of 216 rows by 288 columns (1728 words, 36 bits per word)
called the PAC (picture accumulator) which was named in memory of
McCormick's ILLIAC-III. After THRESH, the PAC contains blobs of
bits. A blob is defined as "rook's move" connected; that is
every bit of a blob can be reached by horizontal or vertical moves
from any other bit without having to cross a zero bit or having to
make a diagonal (bishop's) move. Blobs may of course have holes. Or
equivalently a blob always has one outer perimeter polygon, and may
have one, several or no inner perimeter polygons. This blob and hole
topology is recoverable from the CRE data structure and is built by
the nesting step.
{π} Next, PACXOR copies the PAC into two slightly larger bit
arrays named HSEG and VSEG. Then the PAC is shifted down one row and
exclusive ORed into the HSEG array; and the PAC is shifted right one
column and exclusive ORed into the VSEG array to compute the
horizontal and vertical border bits of the PAC blobs. Notice, that
technically this is the very heart of the edge finder of CRE;
namely, PACXOR is the mechanism that converts regions into edges.
Edge tracing is the only operation CRE performs on the 1-bit rasters;
although Boolean image processing has caught the eye of many vision
programmers (perhaps because it resembles an array automata or the
game Life) one rapidly discovers that raster operations alone are too
weak to do anything interesting that can't already be done better
analytically in a raster of numbers or topologically in a
node/link data structure.
⊂7.3 Contouring.⊃
Contouring, converts the bit arrays HSEG and VSEG into
vectors and polygons. The contouring itself, is done by a single
subroutine named MKPGON, make polygon. When MKPGON is called, it
looks for the upper most left non-zero bit in the VSEG array. If the
VSEG array is empty, MKPGON returns a NIL. However, when the bit is
found, MKPGON traces and erases the polygonal outline to which that
bit belongs and returns a polygon node with a ring of vectors. The
MKPGON trace can go in four directions: north and south along vertical
columns of bits in the VSEG array, or east and west along horizontal
rows of the HSEG array. The trace starts by heading south until it
hits a turn; while heading south MKPGON must check for first a turn to
the east (indicated by a bit in HSEG); next for no turn (continue
south); and last for a turn to the west. When a turn is encountered
MKPGON creates a vector node representing the run of bits between the
previous turn and the present turn. The trace always ends heading
west bound. The outline so traced can be either the edge of a blob
or a hole, the two cases are distinguished by looking at the
VIC-polygon's uppermost left pixel in the PAC bit array.
There are two complexities: contrast accumulation and
dekinking. The contrast of a vector is defined as (QUOTIENT
(DIFFERENCE (Sum of pixel values on one side of the vector)(Sum of
pixel values on the other side of the vector)) (length of the vector
in pixels)). Since vectors are always either horizontal or vertical
and are construed as being on the cracks between pixels; the
specified summations refer to the pixels immediately to either side
of the vector. Notice that this definition of contrast will always
give a positive contrast for vectors of a blob and negative contrast
for the vectors of a hole.
The contours that have just been traced would appear
"sawtoothed" or "kinky"; the terms "kink", "sawtooth" and "jaggy" are
used to express what seems to be wrong about the lowermost left
polygon in Figure 7.2. The problem involves doing something to a
rectilinear quantized set of segments, to make its continuous nature
more evident. In CRE, the jaggies solution (in the subroutine MKPGON)
merely positions the turning locus diagonally off its grid point
a little in the direction (northeast, northwest, southwest or
southeast) that bisects the turn's right angle. The distance of
dekink vernier positioning is always less than half a pixel; but
greater for brighter cuts and less for the darker cuts; in order to
preserve the nesting of contours. The sawtoothed and the dekinked
versions of a polygon have the same number of vectors. I am very
fond of this dekinking algorithm because of its incredible
efficiency; given that you have a north, south, east, west polygon
trace routine (which handles image coordinates packed row, column
into one word); then dekinking requires only one more ADD instruction
execution per vector !
⊂7.4 Polygon Nesting.⊃
The nesting problem is to decide whether one contour polygon
is within another. Although easy in the two polygon case; solving
the nesting of many polygons with respect to each other becomes
n-squared expensive in either compute time or in memory space. The
nesting solution in CRE sacrifices memory for the sake of greater
speed and requires a 31K array, called the SKY. CRE's accumulation of
a properly nested tree of polygons depends on the order of threshold
cutting going from dark to light. For each polygon there are two
nesting steps: first, the polygon is placed in the tree of nested
polygons by the subroutine INTREE; second, the polygon is placed in
the SKY array by the subroutine named INSKY.
{π} The SKY array is 216 rows of 289 columns of 18-bit pointers.
The name "SKY" came about because, the array use to represent the
farthest away regions or background, which in the case of a robot
vehicle is the real sky blue. The sky contains vector pointers; and
would be more efficient on a virtual memory machine that didn't
allocate unused pages of its address space. Whereas most computers
have more memory containers than address space; computer graphics and
vision might be easier to program in a memory with more address space
than physical space; i.e. an almost empty virtual memory.
The first part of the INTREE routine finds the surrounder of
a given polygon by scanning the SKY due east from the uppermost left
pixel of the given polygon. The SON of a polygon is always its
uppermost left vector. After INTREE, the INSKY routine places
pointers to the vertical vectors of the given polygon into the sky
array. The second part of the INTREE routine checks for and fixes up
the case where the new polygon captures a polygon that is already
enclaved. This only happens when two or more levels of the image have
blobs that have holes. The next paragraph explains the arcane
details of fixing up the tree links of multi level hole polygons; and
may be skipped by everyone but those who might wish to implement a
polygon nester.
Let the given polygon be named Poly; and let the surrounder
of Poly be called Exopoly; and assume that Exopoly surrounds several
enclaved polygons called "endo's", which are already in the nested
polygon tree. Also, there are two kinds of temporary lists named the
PLIST and the NLIST. There is one PLIST which is initially a list of
all the ENDO polygons on Exopoly's ENDO ring. Each endo in turn has
an NLIST which is initially empty. The subroutine INTREE re-scans
the sky array for the polygon due east (to the left) of the uppermost
left vector of each endo polygon on the PLIST, (Exopoly's ENDO ring).
On such re-scanning, (on behalf of say an Endo1), there are four
cases: <No change>; the scan returns Exopoly; which is Endo1's
original EXO. <Poly captures Endo1>; the scan returns Poly indicating
that endo1 has been captured by Poly. <My brothers fate>; the scan
hits an endo2 which is not on the PLIST; which means that endo2's EXO
is valid and is the valid EXO of endo1. <My fate delayed>; the scan
hits an endo2 which is still on the PLIST; which means that endo2's
EXO is not yet valid but when discovered it will also be Endo1's EXO;
so Endo1 is CONSed into Endo2's NLIST. When an endo polygon's EXO has
been rediscovered, then all the polygons on that endo's NLIST are
also placed into the polygon tree at that place. All of this link
crunching machinery takes half a page of code and is not frequently
executed.
⊂7.5 Contour Segmentation.⊃
In CRE the term <segmenting> refers to the problem of
breaking a 1-D manifold (a polygon) into simple functions (arcs). The
segmenting step, converts the polygons of vertical and horizontal
vectors into polygons of arcs. For the present the term "arc" means
"linear arc" which is a line segment. Fancier arcs: circular and
cubic spline were implemented and thrown out mostly because they were
of no use to higher processes such as the polygon compare which would
have to break the fancy arcs back down into linear vectors for
computing areas, inertia tensors or mere display buffers.
Segmenting is applied to each polygon of a level. To start,
a ring of two arcs is formed (a bi-gon) with one arc at the uppermost
left and the other at the lowermost right of the given vector
polygon. Next a recursive make arc operation, MKARC, is appled to
the two initial arcs. Since the arc given to MKARC is in a one to one
correspondence with a doubly linked list of vectors; MKARC checks to
see whether each point on the list of vectors is close enough to the
approximating arc. MKARC returns the given arc as good enough when
all the sub vectors fall within a given width; otherwise MKARC splits
the arc in two and places a new arc vertex on the vector vertex that
was farthest away from the original arc.
The two large images in Figure 7.3, illustrate a polygon
smoothed with arc width tolerances set at two different widths in
order to show one recursion of MKARC. The eight smaller images
illustrate the results of setting the arc width tolerance over a
range of values. Because of the dekinking mentioned earlier the arc
width tolerance can be equal to or less than one pixel and still
yield a substantial reduction in the number of vectors it takes to
describe a contour polygon.
A final important detail is that the arc width tolerance is
actually taken as a function of the highest contrast vector found
along the arc; so that high contrast arcs are smoothed with much
smaller arc width tolerances than are low contrast arcs. After
smoothing, the contrast across each arc is computed and the ring of
arcs replaces the ring of vectors of the given polygon. (Polygons
that would be expressed as only two arcs are deleted).
⊂7.6 Related and Future Image Analysis.⊃
{π} In general, robotic image analysis should consist of three
steps: first, high quality pictures are taken continuously in time
and space; second, several low level bulk operations (such as
correlation, filtering, histogramming and thresholding) are applied
to each image and to pairs of images; third, the rasters are
converted into linked 2-D structures which are further amalgamated
into connected 3-D models. It is clear to me that my present
implementation only has fragile toy routines where rugged tools are
needed. Eventually, more kinds of image features and larger coherent
structures must be included. In particular, the contour maps should
be bundled into regional mosaics and more features should be woven
into the node/link structure.
Contour image processing is effectively surveyed by Freeman
(74) who gives the erroneous impression that contour images are the
best image representation (rasters and mosaics are equally
important). Contours are applied to recognition of silhouettes by
Dudani (70) using moments similar to those explained in the next
chapter. Finally, my own acquaintance with the contour image
representation was initially derived from papers by Zahn (66) and
Krakauer (71).